A Readability Checker with Supervised Learning Using Deep Indicators
نویسندگان
چکیده
Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surfaceoriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficulties a person can have to understand a text. Therefore we use deep syntactic and semantic indicators in addition. The syntactic information is represented by a dependency tree, the semantic information by a semantic network. Both representations are automatically generated by a deep syntactico-semantic analysis. A global readability score is determined by applying a nearest neighbor algorithm on 3,000 ratings of 300 test persons. The evaluation showed that the deep syntactic and semantic indicators lead to promising results comparable to the best surface-based indicators. The combination of deep and shallow indicators leads to an improvement over shallow indicators alone. Finally, a graphical user interface was developed which highlights difficult passages, depending on the individual indicator values, and displays a global readability score.
منابع مشابه
A Readability Checker with Supervised Learning using Deep Syntactic and Semantic Indicators
Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surface-oriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficul...
متن کاملA Semantically Oriented Readability Checker for German
One major reason that readability checkers are still far away from judging the understandability of texts consists in the fact that no semantic information is used. Syntactic, lexical, or morphological information can only give limited access for estimating the cognitive difficulties for a human being to comprehend a text. In this paper however, we present a readability checker which uses seman...
متن کاملAll Mixed Up? Finding the Optimal Feature Set for General Readability Prediction and Its Application to English and Dutch
Readability research has a long and rich tradition, but there has been too little focus on general readability prediction without targeting a specific audience or text genre. Moreover, although NLP-inspired research has focused on adding more complex readability features, there is still no consensus on which features contribute most to the prediction. In this article, we investigate in close de...
متن کاملThe Readability Checker Delite Technical Report
This report describes the DeLite readability checker which automatically assesses the linguistic accessibility of Web documents. The system computes readability scores for an arbitrary German text and highlights those parts of the text causing difficulties with regard to readability. The highlighting is done at different linguistic levels, beginning with surface effects closely connected to mor...
متن کاملThe Readability of Helpful Product Reviews
Consumers frequently rely on user-generated product reviews to guide purchasing decisions. Given the everincreasing volume of such reviews and variations in review quality, consumers require assistance to effectively leverage this vast information source. In this paper, we examine to what extent the readability of reviews is a predictor of review helpfulness. Using a supervised classification a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Informatica (Slovenia)
دوره 32 شماره
صفحات -
تاریخ انتشار 2008